Source-Error Aware Phrase-Based Decoding for Robust Conversational Spoken Language Translation

نویسندگان

  • Sankaranarayanan Ananthakrishnan
  • Wei Chen
  • Rohit Kumar
  • Dennis Mehay
چکیده

Spoken language translation (SLT) systems typically follow a pipeline architecture, in which the best automatic speech recognition (ASR) hypothesis of an input utterance is fed into a statistical machine translation (SMT) system. Conversational speech often generates unrecoverable ASR errors owing to its rich vocabulary (e.g. out-of-vocabulary (OOV) named entities). In this paper, we study the possibility of alleviating the impact of unrecoverable ASR errors on translation performance by minimizing the contextual effects of incorrect source words in target hypotheses. Our approach is driven by locally-derived penalties applied to bilingual phrase pairs as well as target language model (LM) likelihoods in the vicinity of source errors. With oracle word error labels on an OOV word-rich English-toIraqi Arabic translation task, we show statistically significant relative improvements of 3.2% BLEU and 2.0% METEOR over an error-agnostic baseline SMT system. We then investigate the impact of imperfect source error labels on error-aware translation performance. Simulation experiments reveal that modest translation improvements are to be gained with this approach even when the source error labels are noisy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-Supervised Word Sense Disambiguation for Mixed-Initiative Conversational Spoken Language Translation

Lexical ambiguity can cause critical failure in conversational spoken language translation (CSLT) systems due to the wrong sense being presented in the target language. In this paper, we present a framework for improving translation of ambiguous source words that (a) constrains statistical machine translation (SMT) decoding with phrase pair clusters to select a desired sense for translation; (b...

متن کامل

NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers...

متن کامل

Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation

We propose a novel technique for adapting text-based statistical machine translation to deal with input from automatic speech recognition in spoken language translation tasks. We simulate likely misrecognition errors using only a source language pronunciation dictionary and language model (i.e., without an acoustic model), and use these to augment the phrase table of a standard MT system. The a...

متن کامل

IBM spoken language translation system evaluation

We discuss phrase-based statistical machine translation performance enhancing techniques which have proven effective for Japanese-to-English and Chinese-toEnglish translation of BTEC corpus. We also address some issues that arise in conversational speech translation quality evaluations.

متن کامل

Lightly-Supervised Word Sense Translation Error Detection for an Interactive Conversational Spoken Language Translation System

Lexical ambiguity can lead to concept transfer failure in conversational spoken language translation (CSLT) systems. This paper presents a novel, classificationbased approach to accurately detecting word sense translation errors (WSTEs) of ambiguous source words. The approach requires minimal human annotation effort, and can be easily scaled to new language pairs and domains, with only a wordal...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013